Prediction
Predictions
class eole.predict.prediction.Prediction(src, srclen, pred_sents, attn, pred_scores, estim, tgt_sent, gold_score, word_aligns, ind_in_bucket)
Bases: object
Container for a predicted sentence.
- Variables:
- src (LongTensor) – Source word IDs.
- srclen (List *[*int ]) – Source lengths.
- pred_sents (List *[*List *[*str ] ]) – Words from the n-best predictions.
- pred_scores (List *[*List *[*float ] ]) – Log-probs of n-best predictions.
- attns (List *[*FloatTensor ]) – Attention distribution for each prediction.
- gold_sent (List *[*str ]) – Words from gold prediction.
- gold_score (List *[*float ]) – Log-prob of gold prediction.
- word_aligns (List *[*FloatTensor ]) – Words Alignment distribution for each prediction.
log(sent_number, src_raw='')
Log prediction.
class eole.predict.prediction.PredictionBuilder(vocabs, n_best=1, replace_unk=False, phrase_table='', tgt_eos_idx=None, id_tokenization=False)
Bases: object
Build a word-based prediction from the batch output of predictor and the underlying dictionaries.
Replacement based on “Addressing the Rare Word Problem in Neural Machine Translation” []
- Parameters:
- (****) (vocabs)
- (****)
- n_best (int) – number of predictions produced
- replace_unk (bool) – replace unknown words using attention
Predictor Classes
class eole.predict.inference.Inference(model, vocabs, config, model_config, device_id=0, global_scorer=None, report_score=True, logger=None, return_gold_log_probs=False)
Bases: object
Predict a batch of sentences with a saved model.
- Parameters:
- model (eole.modules.BaseModel) – Model to use for prediction
- vocabs (dict *[*str , Vocab ]) – A dict mapping each side’s Vocab.
- config
- model_config
- device_id
- global_scorer (eole.predict.GNMTGlobalScorer) – Prediction scoring/reranking object.
- report_score (bool) – Whether to report scores
- logger (logging.Logger or NoneType) – Logger.
predict_batch(batch, attn_debug, streamer=None)
Predict a batch of sentences.
class eole.predict.Translator(model, vocabs, config, model_config, device_id=0, global_scorer=None, report_score=True, logger=None, return_gold_log_probs=False)
Bases: Inference
predict_batch(batch, attn_debug, streamer=None)
Translate a batch of sentences.
class eole.predict.GeneratorLM(model, vocabs, config, model_config, device_id=0, global_scorer=None, report_score=True, logger=None, return_gold_log_probs=False)
Bases: Inference
predict_batch(batch, attn_debug, scoring=False, streamer=None)
Predict a batch of sentences.
- Parameters:
- batch – Batch of source data.
- attn_debug (bool) – Whether to return attention weights.
- scoring (bool) – Whether to run in scoring mode.
- streamer (GenerationStreamer , optional) – If provided, tokens are pushed to the streamer at each decoding step to enable token-by-token output streaming.
class eole.predict.Encoder(model, vocabs, config, model_config, device_id=0, global_scorer=None, report_score=True, logger=None, return_gold_log_probs=False)
Bases: Inference
predict_batch(batch, attn_debug, streamer=None)
Predict a batch of sentences.
class eole.predict.AudioPredictor(model, vocabs, config, model_config, device_id=0, global_scorer=None, report_score=True, logger=None, return_gold_log_probs=False)
Bases: Translator
Translator subclass for audio encoder-decoder models.
Adds:
- Token suppression (suppress_tokens from eole config)
- Forced decoder prefix (SOT, language, task tokens)
- Sequential timestamp-seeking: decodes audio windows using timestamp
tokens to determine seek advancement
- Configurable timestamp output: none (plain text), segment (JSON), word
predict_batch(batch, attn_debug, streamer=None)
Override to inject decoder prefix tensor into batch.
Streaming
class eole.predict.streamer.GenerationStreamer(vocabs, transform_pipe=None, timeout: float = 120.0)
Bases: object
Streamer for token-by-token generation output.
Tokens are put into a thread-safe queue by the generation loop and can be consumed as a Python iterator. The streamer handles incremental detokenization so that consumers receive human-readable text chunks.
This is primarily designed for use with GeneratorLM (decoder-only
LLM models). For best results, use with batch_size=1.
- Parameters:
- vocabs (dict) – Vocabulary dictionaries from the model.
- transform_pipe (TransformPipe , optional) – Transform pipeline for
detokenization. When provided (typical for HuggingFace /
id-tokenization models), full-sequence incremental decoding
is used to yield clean text. When
None, tokens are looked up directly in the vocabulary. - timeout (float) – Maximum seconds to wait for the next token before the iterator stops. Default is 120.0.
Example usage:
import threading
from eole.inference_engine import InferenceEnginePY
from eole.predict.streamer import GenerationStreamer
engine = InferenceEnginePY(config)
streamer = GenerationStreamer(engine.predictor.vocabs,
engine.transform_pipe)
def run():
engine.infer_list(["Hello, how are you?"], streamer=streamer)
thread = threading.Thread(target=run, daemon=True)
thread.start()
for chunk in streamer:
print(chunk, end="", flush=True)
thread.join()
end()
Signal that generation is complete.
Must be called once by the inference thread after the last token has been put, so that the consumer iterator can terminate cleanly.
put(token_ids)
Add newly generated token IDs to the stream.
Called by the generation loop after each decoding step.
- Parameters:
token_ids – A 1-D tensor or list of shape
(batch_size,)containing the token IDs produced at the current step. Only the first element is used for streaming.
Decoding Strategies
class eole.predict.decode_strategy.DecodeStrategy(pad, bos, eos, unk, start, batch_size, parallel_paths, global_scorer, min_length, block_ngram_repeat, exclusion_tokens, return_attention, max_length, ban_unk_token, add_estimator)
Bases: object
Base class for generation strategies.
- Parameters:
- pad (int) – Magic integer in output vocab.
- bos (int) – Magic integer in output vocab.
- eos (int) – Magic integer in output vocab.
- unk (int) – Magic integer in output vocab.
- start (int) – Magic integer in output vocab.
- batch_size (int) – Current batch size.
- parallel_paths (int) – Decoding strategies like beam search
use parallel paths. Each batch is repeated
parallel_pathstimes in relevant state tensors. - min_length (int) – Shortest acceptable generation, not counting begin-of-sentence or end-of-sentence.
- max_length (int) – Longest acceptable sequence, not counting begin-of-sentence (presumably there has been no EOS yet if max_length is used as a cutoff).
- ban_unk_token (Boolean) – Whether unk token is forbidden
- block_ngram_repeat (int) – Block beams where
block_ngram_repeat-grams repeat. - exclusion_tokens (set *[*int ]) – If a gram contains any of these tokens, it may repeat.
- return_attention (bool) – Whether to work with attention too. If this is true, it is assumed that the decoder is attentional.
- Variables:
- pad (int) – See above.
- bos (int) – See above.
- eos (int) – See above.
- unk (int) – See above.
- start (int) – See above.
- predictions (list *[*list *[*LongTensor ] ]) – For each batch, holds a list of beam prediction sequences. scores (list[list[FloatTensor]]): For each batch, holds a list of scores.
- attention (list *[*list *[*FloatTensor or list [ ] ] ]) – For each
batch, holds a list of attention sequence tensors
(or empty lists) having shape
(step, inp_seq_len)whereinp_seq_lenis the length of the sample (not the max length of all inp seqs). - alive_seq (LongTensor) – Shape
(B x parallel_paths, step). This sequence grows in thestepaxis on each call to :func:advance(). - is_finished (ByteTensor or NoneType) – Shape
(B, parallel_paths). Initialized toNone. - alive_attn (FloatTensor or NoneType) – If tensor, shape is
(B x parallel_paths, step, inp_seq_len), whereinp_seq_lenis the (max) length of the input sequence. - target_prefix (LongTensor or NoneType) – If tensor, shape is
(B x parallel_paths, prefix_seq_len), whereprefix_seq_lenis the (max) length of the pre-fixed prediction. - min_length (int) – See above.
- max_length (int) – See above.
- ban_unk_token (Boolean) – See above.
- block_ngram_repeat (int) – See above.
- exclusion_tokens (set *[*int ]) – See above.
- return_attention (bool) – See above.
- done (bool) – See above.
advance(log_probs, attn)
DecodeStrategy subclasses should override advance().
Advance is used to update self.alive_seq, self.is_finished,
and, when appropriate, self.alive_attn.
block_ngram_repeats(log_probs)
We prevent the beam from going in any direction that would repeat any ngram of size <block_ngram_repeat> more thant once.
The way we do it: we maintain a list of all ngrams of size <block_ngram_repeat> that is updated each time the beam advances, and manually put any token that would lead to a repeated ngram to 0.
This improves on the previous version’s complexity:
- previous version’s complexity: batch_size * beam_size * len(self)
- current version’s complexity: batch_size * beam_size
This improves on the previous version’s accuracy;
- Previous version blocks the whole beam, whereas here we only block specific tokens.
- Before the prediction would fail when all beams contained repeated ngrams. This is sure to never happen here.
initialize(device=None, target_prefix=None)
DecodeStrategy subclasses should override initialize().
initialize should be called before all actions. used to prepare necessary ingredients for decode.
maybe_update_forbidden_tokens()
We complete and reorder the list of forbidden_tokens
maybe_update_target_prefix(select_index)
We update / reorder target_prefix for alive path.
target_prefixing(log_probs)
Fix the first part of predictions with self.target_prefix.
Args:
log_probs (FloatTensor): logits of size (B, vocab_size).
Returns:
log_probs (FloatTensor): modified logits in (B, vocab_size).
update_finished()
DecodeStrategy subclasses should override update_finished().
update_finished is used to update self.predictions,
self.scores, and other “output” attributes.
class eole.predict.beam_search.BeamSearchBase(beam_size, batch_size, pad, bos, eos, unk, start, n_best, global_scorer, min_length, max_length, return_attention, block_ngram_repeat, exclusion_tokens, stepwise_penalty, ratio, ban_unk_token, add_estimator=False)
Bases: DecodeStrategy
Generation beam search.
Note that the attributes list is not exhaustive. Rather, it highlights
tensors to document their shape. (Since the state variables’ “batch”
size decreases as beams finish, we denote this axis with a B rather than
batch_size).
- Parameters:
- beam_size (int) – Number of beams to use (see base
parallel_paths). - batch_size (int) – See base.
- pad (int) – See base.
- bos (int) – See base.
- eos (int) – See base.
- unk (int) – See base.
- start (int) – See base.
- n_best (int) – Don’t stop until at least this many beams have reached EOS.
- global_scorer (eole.predict.GNMTGlobalScorer) – Scorer instance.
- min_length (int) – See base.
- max_length (int) – See base.
- return_attention (bool) – See base.
- block_ngram_repeat (int) – See base.
- exclusion_tokens (set *[*int ]) – See base.
- beam_size (int) – Number of beams to use (see base
- Variables:
- _batch_offset (LongTensor) – Shape
(B,). - _beam_offset (LongTensor) – Shape
(batch_size x beam_size,). - alive_seq (LongTensor) – See base.
- topk_log_probs (FloatTensor) – Shape
(B, beam_size,). These are the scores used for the topk operation. - src_len (LongTensor) – Lengths of encodings. Used for masking attentions.
- select_indices (LongTensor or NoneType) – Shape
(B x beam_size,). This is just a flat view of the_batch_index. - topk_scores (FloatTensor) – Shape
(B, beam_size). These are the scores a sequence will receive if it finishes. - topk_ids (LongTensor) – Shape
(B, beam_size). These are the word indices of the topk predictions. - _batch_index (LongTensor) – Shape
(B, beam_size). - _prev_penalty (FloatTensor or NoneType) – Shape
(B, beam_size). Initialized toNone. - _coverage (FloatTensor or NoneType) – Shape
(1, B x beam_size, inp_seq_len). - hypotheses (list *[*list *[*Tuple *[*Tensor ] ] ]) – Contains a tuple of score (float), sequence (long), and attention (float or None).
- _batch_offset (LongTensor) – Shape
advance(log_probs, attn)
DecodeStrategy subclasses should override advance().
Advance is used to update self.alive_seq, self.is_finished,
and, when appropriate, self.alive_attn.
initialize(*args, **kwargs)
DecodeStrategy subclasses should override initialize().
initialize should be called before all actions. used to prepare necessary ingredients for decode.
update_finished()
DecodeStrategy subclasses should override update_finished().
update_finished is used to update self.predictions,
self.scores, and other “output” attributes.
eole.predict.greedy_search.sample_with_temperature(logits, temperature, top_k, top_p)
Select next tokens randomly from the top k possible next tokens.
Samples from a categorical distribution over the top_k words using
the category probabilities logits / temperature.
- Parameters:
- logits (FloatTensor) – Shaped
(batch_size, vocab_size). These can be logits ((-inf, inf)) or log-probs ((-inf, 0]). (The distribution actually uses the log-probabilitieslogits - logits.logsumexp(-1), which equals the logits if they are log-probabilities summing to 1.) - temperature (float) – Used to scale down logits. The higher the value, the more likely it is that a non-max word will be sampled.
- top_k (int) – This many words could potentially be chosen. The other logits are set to have probability 0.
- top_p (float) – Keep most likely words until the cumulated probability is greater than p. If used with top_k: both conditions will be applied
- logits (FloatTensor) – Shaped
- Returns:
- topk_ids: Shaped
(batch_size, 1). These are the sampled word indices in the output vocab. - topk_scores: Shaped
(batch_size, 1). These are essentially(logits / temperature)[topk_ids].
- topk_ids: Shaped
- Return type: (LongTensor, FloatTensor)
Scoring
class eole.predict.penalties.PenaltyBuilder(cov_pen, length_pen)
Bases: object
Returns the Length and Coverage Penalty function for Beam Search.
- Parameters:
- length_pen (str) – option name of length pen
- cov_pen (str) – option name of cov pen
- Variables:
- has_cov_pen (bool) – Whether coverage penalty is None (applying it is a no-op). Note that the converse isn’t true. Setting beta to 0 should force coverage length to be a no-op.
- has_len_pen (bool) – Whether length penalty is None (applying it is a no-op). Note that the converse isn’t true. Setting alpha to 1 should force length penalty to be a no-op.
- coverage_penalty (callable [ *[*FloatTensor , float ] , FloatTensor ]) – Calculates the coverage penalty.
- length_penalty (callable [ *[*int , float ] , float ]) – Calculates the length penalty.
coverage_none(cov, beta=0.0)
Returns zero as penalty
coverage_summary(cov, beta=0.0)
Our summary penalty.
coverage_wu(cov, beta=0.0)
GNMT coverage re-ranking score.
See “Google’s Neural Machine Translation System” [].
cov is expected to be sized (*, seq_len), where * is
probably batch_size x beam_size but could be several
dimensions like (batch_size, beam_size). If cov is attention,
then the seq_len axis probably sums to (almost) 1.
length_average(cur_len, alpha=1.0)
Returns the current sequence length.
length_none(cur_len, alpha=0.0)
Returns unmodified scores.
length_wu(cur_len, alpha=0.0)
GNMT length re-ranking score.
See “Google’s Neural Machine Translation System” [].
class eole.predict.GNMTGlobalScorer(alpha, beta, length_penalty, coverage_penalty)
Bases: object
NMT re-ranking.
- Parameters:
- alpha (float) – Length parameter.
- beta (float) – Coverage parameter.
- length_penalty (str) – Length penalty strategy.
- coverage_penalty (str) – Coverage penalty strategy.
- Variables:
- alpha (float) – See above.
- beta (float) – See above.
- length_penalty (callable) – See
penalties.PenaltyBuilder. - coverage_penalty (callable) – See
penalties.PenaltyBuilder. - has_cov_pen (bool) – See
penalties.PenaltyBuilder. - has_len_pen (bool) – See
penalties.PenaltyBuilder.